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Abstract 



A standard method for exploring item bias is the intergroup 
comparison of item difficulties* This paper describes a refinement 
and generalization of this technique. In contrast to prior approaches, 
the proposed method deletes outlying items from the ^f cirmulatioh of a 
criterion for identifying items as deviant. It also extenlis the 
mathematical framework of item difficulty comparisons to allow- the 
^simultaneous analysis, of any number of groups. As an example, the 
proposed method is applied to a set of quantitative items selected 
from a business school admission test,. 



The Identification ojf Biased Items 

Introduction 

The study of item bias is concerned with the internal consistency of 

a test. An attempt is made to identify items that behave differe^itly 

i 

from other items presumed to be measuring the same ability. Implicit in 
research on item bias is group comparison; items are biased in favor of 
or against one group of test takers relative to another. Numerous 
techniques have emerged to investigate item bias (for a review, see 
Rudner^ Getson, & Knight, 1980), but among the most commonly used is the 
intergroup comparison of item difficulties (see, e.g., Angoff & Ford, 

1973; Donlon, Hicks, & Wallmark, 1980). In this technique, an item's 

> 

difficulty is taken to be the z score associated with the proportion of test 
takers responding correctly to the item. For a set of items^ the diffi- 
culties for one group are plotted against those for a second group, l^fhen 
the items are more or less homogeneous in the ability they measure, a 
line is suggested by the resulting points. This follows, since items 
that are more difficult for one group will be more difficult for the 
other group, and the easier items for one group will also be the easier 
items for the second group. 

A line of best fit is cal(:ulated for the plotted points. Items far 
removed from the line behave unexpectedly relH-tive' to most other items. 
They could be described as more difficult for one of the groups than 
would have been predicted by the relative performance of the two groups 
on other items. One presumes that such a deviant item is sensitive to 
factors to which most other items are insensitive or less sensitive.,' By 
introducing additional conditions for the successful completion of the 
item, these factors interfere with the item's expected relative difficulty. 



u 
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Thus, the comparison of item difficulties distinguishes items that behave 
unusually relative to the behavior of most other items, where distance 
from the line of best fit is the measure of unexpected behavior. This 
line itself represents an ideal, the relationship of a set of items 
(that are homogeneous in what they measure. / 

^ Error enters into the approximation of the ideal line when items 
sensitive to extraneous factors are included in the best-fit calculations. 
A better approximation is achieved if the calculations include only the 
more homogeneous items. One can lessen the influence of items sensitive 
to other factors by removing from the calculations items far removed from 
the line suggested by the mainstream of points. A recent report presented 
an algorithm for doing this (Sinnott , 1980). Basically, the algorithm 
successively removes subsets of items, stopping when a line is found in 
which only those points within a specified distance participate in its 
derivation. 

The comparison 6f item difficulties has thus far been restricted to 

two groups. In this paper procedures are outlined that allow the 

simultaneous comparison of any number of groups. The procedures incorporate 

the algorithm described above, removing from the begt-fit calculations 

those items most likely to be sensitive to extraneous factors. After 

the procedures are presented, their application will be illustrated. 

First, though, the mathematical foundation for the procedures is presented. 

« 

Mathematical Background 

In this section the line of best fit is derived for a set of points 
in n-space. The line sought is that which minimizes the squared distances 



t 



of the points from Jthe line. The discussion is adapted from arguments 
presented by Pearson (1901). * 

Let A , be the difficulty of item a for group j, and S be the set 
of vectors A = (A , ...,A ), ct=l,,..,Y, where y is the number of items and 
n the number of groups. For a given group j, a mean and variance are 
defined by 



Y 

M. = E A , / Y and ' (D 
^ a=l ^. 



= E (A , - M )2 / (y - 1) ' (2) 

J a=l ^ 

For two groups, j and k , a correlation coefficient is given by 



a=l 



A line, L, in n-space can be written in the fonn ^=x'+tu^, where 
is a vector lying on the line, t varies over ttie real numbers, 
and ^ is a unit vector parallel to the line* ' Our Roal is to find 

and u for the line that best fits S, expressing x'' and u solely in 
terms of the statistics- M^, s^, and r^j^. 

Let p(ct).be the perpendicular distance of A^ from L. As will now be 
shown, p(ct) can be expressed in terms of x', u, and A . For a given A , 
let t' be chosen such that .the vector 



A - (x' + t'u) 



(A) 



is perpendicular to u. The length of (4) is the perpendicular distance of L 
from L, or p(a). Thus 

p2(a) = a - (x- + C^JHA - (x^ + t'u)). , 

Since u^ is perpendicular to (4), ^ 

P^(ct)-4 - (x^ + - xO- (5) 

The perpendicularity of u and (4) is further exploited to find an 
expression for t'. Since uf(A^ - (x"+ t'ui)) = 0/ t' = ij<A^-x")/ Substi- 
tuting this expression for t' into (5) yields 

^ * 

Equation 6 expresses p(ct) in terms of x' , u, and In terms of 

Y 

pC*^), the line desired is that which minimizes Z p^(a) . This sum can be written 

a=l 

I p2(a) = Z - - - (}i<^oi - ^'))^ 

a=l a=l 

t ' ♦ Y n ' " n 

= E ( E (A - x:)2 - ( I u a - x;))2). (7) 

a=l j=l J j=i J "J J 

The Lagrange multiplier method can be applied to minimize (7) subject to the 
n 

constraint that E uf= 1. This will lead to expressions for x' and u in 
terms of the for the line that best fits S« The Lagrange formula is 

I (Z - -.( " - + X( Z u 2- 1) (8) 

a=I j = l "J J j=l ^ ^ , j=l J 



where ^ Is the Lagrange multiplier. 



Differentiating (8) first with respect to x'j^ results in the following 



f 

expressi6n fo^r each k=l,»**',n: 



a=i j=i 



which can also be written as 



where 



Y 

= 2 / Y and 



Y n 

t = 2 ( 2 u (A " xO) / Y. 
a-1 j-r ^ ^ 



Note that t does not depend on k. Thus for each-k, Mj^ = x'j^ + tUj^, 

which is just the statement that ^JfC^^p • • • jM^) li^s on the line that 
Y 

minimizes 2, p^(a). Hence the vector x' can be taken as M. 

Differentiating Equation 8 with respect to Uj^, f or. k=l , . . . ,n, 
yields 

Y n 



a=l j=i J J J 



'5 



(.9) 



In ^eras of the statistics s^, s^, and r^j^ defined earlier, this may be 
written 



(A / (Y - l))u^ = 2^ "j^jV 



(10) 



^ for k = 1 , . . . ,n. 



J. 'J 



\ 



\ 
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Erom (10) one may observe that u is an eigenvector of the symmetric matrix 



'2- 



s s.r, 
n 1 In 



s s r 
n 2^2n 



s s r, s^s r^ 

1 n In 2 n 2n 



(11) 



To determine which eigenvector, more information is needed about its 
associated eigenvalue, ^/ (y 1) • 

Multiplying each of the equations in (10) by its appropriate u^^ and 
adding the resulting n equations together results in the following 
equation: 



us 



n 

X/(y - 1) = 2 u.^s.^ + 2- 
n 

"2 ^ ' 2'/ \ 

ing the fact that ~ The equation for ^'^■^ P ,(«) 



l<j<k<n J ^ 

Y 0 



(12) 



given in (7) can be written in terms of s^, Sj^/and r^j^ as fol\ows:_ 



^ P^(cx) = (Y - 1) 2 s - (Y - 1) *E u 2s 2 - 2(y - 1) , 2 u il s s r 
a=l j = l ^ j = i J J l<j<k<n J J 



Combining* this expression with (12} yi^^s 



Y ' n 

2 p2(a) = (y'- 1) 2 s:2 - A 



a=l 
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To minimize the sum of the p (a) requires choosing the greatest value 
possible for the Lagrange multiplier X. Thus, the desired eigenvector of 

\ * A 

t\e matrix (11) is th^t with Wximum eigenvalue. 

y , ' \ ' ^ • — . 

The x' and u that minimize E, P (ct) have now been expressed 
solefy in^terms of parameters derived from the set of ^^,a = 1,*..,Y» 
The^vector/jc.' is M=(Mp . . . ,M^) , and u is the unit eigenvector of (11) 

with maximum eigenvalue. All the points of S were included in the 

<^ . ... 

calculation of L. l^en the line-fitting algorithm is illustrated , the . 

line derived from all points is referred to as the preliminary line of 

best fit. As the algorithm is carried out, a number of intermediate 

lines may be calculated, each based on the points ^remaining^af ter the 

removcll of- those whose distance from the previously calculated. Line 

exceeds some cutoff. The line ultimately resulting from application 

of the algorithm will be* referred to as the line of best fit. When 

points are removed, the statistics given in (1), (2), and (3*)^ must be 

recomputed, based on the remaining points. Lines calculated after the 

preliminary line are derived in. the same way as die preliminary line, but 

with appropriately adjusted parameters. 

Identifying and Analyzing Outlying Items 

Letf c be the distance froml:he line of best fit ^ beyond which one 

considers a point's deviation possibly due to extraneous factors. This 

cutoff may be empirically determined, as is shown in the next section. 

The algorithm for calculating the line of best fit for a set 

S={(A A )} T i^ as follows: ' • 

al* * an^ a=l^...^Y 
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1. Find Che line that best fits Che set* 

* S-R, where initially R Is the' empty set* 

2. Determine the distances of al,l points in 
B from the line. The distance of a given 
point, A , qan be derived from (6), which 
can also be written as 



P^(a) = I (A . - M )2 - ( E u (A - M ))2 (13) . 
j=l • .J j=l J ^•^^ J 



3* Let R' be the set of points whose distances 
from tKe line are greater than the cutoff 
distance c. 

4., If R = R', stop. Otherwise set R equal to 
R' and begin again with step 1. 
At the conclusion of the algorithm, the set R will contain those 
points considered outliers relative to the cutoff c. These items may be 

further analyzed by determiniTig the groups contributing most to their 

t 

deviance. For a givei\ pair] of groups, 3 and k»,- the liiae of best fit can 
be projected onto the j-k plane. For outlying item 'a, the distance ^ 
of (^ctj j^cti^) f^oni the projected line can be calculated.^ A comparison of 
these distances over all possible pairs will reflect; the relative contri- 
bution of the various pairs to the item's overall deviance. * 

The projection of the line of best fit ontb the plane determined by the 
groups j and k is given by A+Bt", where B = Uj^/u^, A = M^-BM^, and t varies 
over the real ntimbers. The distance of (A^\ ,A^j^) from the projected line 
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can be derived from the two-dimensional analogue of (13). However, a more 
useful formula is provided by: • 



TP.^(a) =(A - + BA^.) / 7 (1 + B^) (lA) 



This formula allows for positive and^fegative values and thus indicates 

whether (A . ,A , ) is above or below the projected line. The formula 

2 2 

follows from the minimization of (t'-A . ) + (A+Bt-A^, ) » which is 

aj .^\\ 

^just^-th e^q uaxe^f>^the^is.t:ana€L^i_( A^.^^^ 
.projected line. 

When (14) is positive it follows that A^j^<A+BA^j . Thus (A^^.A^j^ ) 
is below (A A+BA .), which means that it is below the projected line. 
In the same way it follows that when (14) is negative, (A^^jA^^ ) is 
above the projected line. A point below the line suggests t^at the item ^ 
is unexpectedly difficuJLt for group j. A point above the line suggests 
that the item is unexpectedly .difficult for group k. 

Illustration of Procedures 

Item. data for the examples presented below were selected from those- 
used in Sinnott (1980)*. In this earlier work, the object of study was 
the Graduate Management Admission Test. A stratified sample of some 
5,000 individuals taking the test in January 1977 provided the item 
data. Stratification was over a number of variables, including sex, age, 
ethnicity or ^ace, and language fluency. Here groups varying in age are 
examined. Only one section of the test form is considered—problem 




solving* This was a 30-itein, multiple-choice section presenting self- 
contained mathematical problems ♦ An item's delta value is taken as its 
difficulty. This is just a linear transformation of the z score, given 
"by A = 4z + 13. 

<» 

Figure I is a frequency distribution of the item distances from Che / 
preliminary line of best fit for a comparison involving three groups: 
randomly selected test takers between the ages 29 through 22, 35 through 
39, and 40 through 65* There were about 1,450 individuals in the youngest 
group and 425 in each of the older groups. The numbers in Figure 1 

tef^rL- to. J.tem-.numhers^__JQiemJ3Ji^as_J:^^ 

than 1.6 delta units from the line. 

Insert. Figure 1 here 

For the three-dimensional comparison. Figure 2 displays Che results 
of calculating the line of best fit for a number of different item 
cutoffs. As smaller cutoffs are taken, the distribution of points within 
the cutoff distance begins to assume a configuration more consistent with 
the theoretically expected normal distribution of items about the line of 
best fit. The distribution presented at the top of the 'figure, is 
associated with the line in which items within 1.5 units are the only 
items participating in its derivation.* For this cutoff, one iteration of 
the algorithm was required, since the line resulting fro^n the removal of 

Item 15 was within 1.5 units of all the remaining points. 

t ^ ^ ^ —-^ 

Inse'rt Figure 2 here 

Ik. 

In contrast, a cutoff of I .0 resulted in the calculation of five 
lines, as Illustrated in Table 1. The intercept and direction listed 
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Figure 1. Frequency distribution of item distances i 
from the preliminary line of best fit for the three-group comparison 
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first are those asjsociated with the preliminary line of best fit» 
Initially, items 15, 20, 24, and 29 were removed since tHey were greater 
than 1.0 units from the preliminary line. The reader may verify this by 
referring to Figure 1. 

The line was then refitted relative to the remainingi26 points. The ^ 
'direction of the new line appears in Table 1 as (.60, .58, .54). The 
line contains the vector (13.65, 13.57, 13.72). All item distances were 
recalculated relative to the new line. In addition to the points previously 
removed, items IT and"17 were gfeafeT^tHarniTO IT^^ 

line. Hence, for the next line derivation, items 11, 15, 20, 24, 27, and 
29 were removed. 

Item distances were calculated again. Of the removed points, only 
Item 29 was found to be within 1.0 units of the third line, and one 
additional point. Item 19, was more than 1.0* units from the line. 
Hence, the fourth line was based on the removal of items 11, 15, 19, 20, 
24, and 27. In addition to these six items,- Item 30 was more than 1.0 
units from the fourth line. Hence, a fifth line was required. However, 
this was the final line, since those items greater than 1.0 units from it 
were the same as those previously removed. The distances given in 
Figure 2 for the distribution associated with* the 1.0 cutoff are relative 
to this fifth and final line. 



Insert Table 1 here 



As can be seen in Figure 2, little new information about outliers is 
added as the cutoff drops below 1.1 delta units. By the 1.1 cutoff, the 

/ 

eight asterisked items have distinguished themselves relative to the main 

cluster of items, and these items also emerge as outliers when smaller 

O 1 O ' ' 

|^|^(^ cutoffs are taken. . . . -IJ . 
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TABLE I 

Results of the Algorithm 
Applied to Three Groups 



u Items removed 



(13.92, 13.97, 14.08) (.59, .59, .55) 15, 20, 24,; 29 

(13.65, 13.57, 13.72) (.60, .58, .54) 11, 15, 20, 24, 27, 29 



(13.53, 13.35, 13.50) ■ (.62, .58, .54) 11, 15, 19, 20, 24, 27 

(13.63, 13.37, 13.49) (.62, .57, .53) 11, 15, 19, 20, 24, 27; 30 

(13.56, 13."32, 13.39) (.63, .58, .53) 11, 15, 19, 20, 24, 27, 30 
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An examination of a plot of item difficulties for three groups 
randomly selected from the same pool of test takers /suggests that cutoffs 
below about 1#0 units are likely to remove items that deviate from the 
main cluster of points for reasons other than item inhomogeneity • 
Figure 3 displays the item distances from a preliminary line of best fit 
for a delta plot involving three groups of Caucasians, each with about 
850 test takers. As can be seen, Item 29, the most deviant item, lies ^ 
between #8 and .9 delta units from the line. Since the three groups were 



randomly sampled from the same population, this deviation cannot be 
attributed to factors that discriminate between the groups. 



Insert Figure 3 here 



For the age-group comparison, the line of best fit for the 1#0 

cutoff was projected onto each of the two-dimensional planes defined by 

the different pairs of groups. For a given pair, j and k, the distance 

of (A . , A . ) from the projected line was calculated for each item a in 
aj ak 

the set of outlying items. The results appear in Table 2. A positive . 
value indicates that the item was unexpectedly difficult for the first 
group listed. A negative value indicates it was unexpectedly easy for 
the first group. As can be seen, all but one of the outlying items were 
unexpectedly difficult for older individuals when compared to younger 
test takers. " 



Insert Table 2 here 
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Figure 3. Frequency distribution of item distances » 
from the line of best fit for a comparison of 
three groups sampled from the same population 
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/ TABL^ 2 

Distance from Projected Lines 
in a Three-Group Comparison 



Item Number 




Groups compared 






35-39 vs. 20-22 


40-65 vs. 20-22 


40-65. vs. 35-39 
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/ A fourth age group Vas added to the other age groups to yield a* 

foLr-dimensional dell;a plot, the results of which are summarised in 

Mgure 4. The fourth group was composed of about 575 test takers between 

the ages 30 and 34» The first .distribution in Figure 4 displays the item 

/ distances, from the preliminary line, of best fit* With a few exceptions, 

there is con;siderable similarity^ in the distributions of Figures 2 and 4. 

Notable exceptions are items 25 and 30, both of which display considerably 

more deviant behavior in the four-dimensional analysis* 

^ - ' -''A 

Insert Figure 4 here 

For the^ four-dimensional analysis. Table 3 presents data similar^to 
Table 2. Again, the line derived from the 1.0 cutoff was used. A 
comparison of Tables 2 and 3 reveals considerable overlap in their data. 
Among the new information emerging from Table 3 is the following. Item 30 
was unexpectedly easy for the middle-age. groups relative to both older 
and younger test takers, and Item 25 was unexpectedly easy for the 40- to 
65-year-olds relative to the 30- to 34-year-olds. 



Insert Table % here 



A content analysis of outlying items may suggest reasons for their 
deviant behavior. However, information gatliered from a single item must 
be Interpreted cautiously since limitations in the methodology may 
Lead to spurious data. Information extracted from a set of similarly 
behaving outliers is more reliable. However, there are shortcomings in 
this approach also, the foremost being that the reasons for an Individual 
Item's outlying behAvlor may be obscured In an aggregate analysis or 
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TABLE 3 

Distance from the Projected Lines 
in a Four-Group Comparison 



- 


Number 


I 




Groups 


• 

compared 


* 




! • 




VS . 

20-22 


35-39 
■ vs. 
20-22 


40-65 
vs . 
-\ 20-22 


35-39 
vs. 
30-3-^ . 


40-65 
vs . 
30.-3A 


' V 
40-65 
vs . 
35-39 




2 


.3 


i.o 


.9 


.7 


.6 


0.' 


* 


11 


.7 / 


.8 


1.3 


.1 


.6 


.5 


i 
t 

! 

] 


15 


1.0 


1.7 


• 1.8 


.6 

1 


• .8. 


.2 

r 


i 
t 


19 


. .6 


1.0 


1.1 


.4 


.5 


.1 


i 


'20 


.6 


1.0 


- . l"-5 


.4. 


. .9 


.5 




24 


.9 


1.7 


1.3 


_ . 7_ 


.4 


- ,2 




. 25 
27 * 


.6 
"1.1 


.1 

1.3 


- .5 
1.0 


-.5 


-1.1 

0 


' - .6 
. - .2 




• 30 . 


-1.0 


- .4 


.7 • 


•«6 


1.6 • 


1.0 



never pursued be':ause other items sensitive to the same factors do not 
appear on the test. 

In the age analysis, the set ^{£yiteras found to be- unexpectedly 
difficult for older'test takers did share a common characteristic* Of 
the thirty items in the problem-solving section, 17 were word problems, 

« * 

posing their questions in the conte^^t of some real-world situation. In 
Qoritrast, the other 13 were more abstract, involving for the most part 
only mathematical concepts. Of the 13 non-yord problems, eight appeared 
as^ unexpectedly difficult for 35- to 65-year-'olds . The non-word problems 
deal with concepts seldom encountered in their purity outside of formal 
academic training. Their appearance as unexpjactedly .difficult for the 
older test takers may be due to a deterioration in a test taker's 
ability to -manipylate these concepts, a deterioration correlated with the 
number of years elapsed since leaving school. 

\ 

'Discussion • " 

r 

In this paper a strat?egy has been presented for stud/ing item bias 
using the intergroup comparison* of item difficulties. In contrast , 
to prior applications of this approach, the^j^r^posed method allows the 
s^imultaneous comparison of any number of groups, thus avoi'ding .the 
awkwardness of numerous pair-wise compari$,ons. ' Fu .hermore, the role of 
deviant items in the formulation of criteria on which biased items are 
distinguished is lessened. SThe procedures outlined and Uluistrated in 
this .paper ailow^ for a more efVicieat and' reliable appMcafibn of ' item 
difficulty comparisons to the study of item bi-a^. >^ 
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The approach is based on an algorithm that ensures that the best-fitting 

line for an n-dimensional plot of item difficulties is derived solely from 

Items lying within a specified distance of the line. The line that best 

fits these items is shown to be that which intersects the vector M = (M ... M ) 

^ 1 n 

and lies in the direction^, where u is the unit eigenvector with maximum 

eigenvalue of the symmetric matrix given in (11) and M is the mean of 

3 

the item difficulties experienced by group j.* 

It appears that no information is lost when additional dimensions 
Are added to an analysis. Sinnott (1980) performed' two-dime.nsional 
comparisons of the age groups studied here. The findings of that investi'- 
gation were contained in the results of both the three- and four-dimensional 
comparisons. Vurthermore, the findings of the three-dimensionaTanalysis 
are repeated ^in the four-dimensional analysis, as can be seen by comparing 
Tables 2 and 3. ' 

The limitations in the proposed strategy stem primarily from the 
'assumption underlying the use-of the inverse normal transformation as a 
measure* of item difficulty. One bases the use of this transformation 
on the existence of .a ie-vel of ability above which success on the item 
is achieved and below whicb, failure. No item in practice has such 
perfect discrimination. ' A more accurate model assumes that test takers , 
over the range of ability may respond correctly to the item, but th^ir 
chances of correct resfjonse improve with ability. Using this model, 
Lord (1977) has illustrated how items sensitive to the same dimension 
may deviate from linearity in a plot of their z scores. , . 
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EmpiriC'^lly, the algorithm seems to be a useful tooF f or the study, of 
item bias. Several theoretical issues remain to be explored, however. 
In par\;icular, the resulting line may be only one of several lines that ^ 
satisfy the criterion of resulting from the consideration of all points ^ 
lying within a specified distance. One might v;ish to show that the 
algorithm uncovers that line with the most points contributing to its 
calculation*. Also, it is theoretically possible that the algorithm might 
never lead to a solution, but circle dmdlessly. However, this seems 
unlikely to happen on any real data set. 
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